Lesson 4 - Web API

Requesting information from the web

Python 'requests' module.

This module provides functions to send a HTTP request and get the response from the server
Requests is a third party module. If not installed, we will need to do "pip install requests" in the mac terminal or in the command pronpt of windows.
http://docs.python-requests.org/en/master/user/quickstart/#make-a-request

Using 'requests' module

Use the requests module to make a HTTP request to http://www.github.com/ibm

Check the status of the request
Display the response header information

Get status code for the request



In [1]:

    
url = 'http://www.github.com/ibm'

Get header information



In [ ]:

Get the body Information



In [ ]:

Using a Web API to Collect Data

An application programming interface is a set of functions that you call to get access to some service.
An API is basically a list of functions and datatsructures for interfacting with websites's data.

The way these work is similar to viewing a web page. When you point your browser to a website, you do it with a URL (http://www.github.com/ibm for instance). Github sends you back data containing HTML, CSS, and Javascript. Your browser uses this data to construct the page that you see. The API works similarly, you request data with a URL (http://api.github.com/org/ibm), but instead of getting HTML and such, you get data formatted as JSON.

Access data using web APIs

Write a program to access all the public OSS projects hosted by IBM on github.com using the web apis

Step 1: Access the Web API service and check rate limits



In [9]:









    



Response status - OK 
53

Step 2: Authentication (if required)

Authenticate requests to increase the API request limit. Access data that requires authentication.

Basic Authentication

Pass the userid and password as parameters in the response.get function
Little risky and prone to hacking. Create dummy user ID and password

OAUTH

OAuth 2 is an authorization framework that enables a user to connect to their account using a third party application
While this is more secure thant the basic authentication (i.e. passing the userid and password while you make a http request), it is a little more difficult to code.
It needs a personal token and a consumer key to be generated and passed to the webserver

Unfortunately different websites have different ways of generating and using the token and consumer keys. Hence we will need to write the authorization code for each website seperately. HOwever, every website provides detailed information on how you can generate and send the token and keys.



In [ ]:

Step 3: Parse the response

The json module gives us functions to convert the JSON response to a python readable data structure.

Write a program to get the number of OSS projects started by IBM



In [13]:









    



Response status - OK 
The number of public repos are :  851

Step 3: Follow the url information from the Web API to find what you need

Let us collect the information regarding the different projects started by IBM



In [ ]:

Step 4: Paginate to get data from other pages

Traverse the pages if the data is spread across multiple pages



In [ ]:

3. Write a CSV

Lets try to write the repos into a CSV file.

Write a code to append data row wise to a csv file



In [ ]:

    
import csv
WRITE_CSV = "C:/Users/kmpoo/Dropbox/HEC/Teaching/Python for PhD Mar 2018/python4phd/Session 2/ipython/Repo_csv.csv"
with open(WRITE_CSV, 'at',encoding = 'utf-8', newline='') as csv_obj:
    write = csv.writer(csv_obj) # Note it is csv.writer not reader
    
    write.writerow(['REPO ID','REPO NAME'])



In [ ]:

    
from google.colab import drive
from google.colab import files
drive.mount('/content/drive/')
uploaded = files.upload()

What do you think will happen if we use 'wt' as mode instead of 'at' ?

Write a program so that you save the IBM repositories into the CSV file. So that each row is a new repository and column 1 is the ID and column 2 is the name



In [ ]:

    
#Enter code here